The edge recombination operator (ERO) is an operator that creates a path that is similar to a set of existing paths (parents) by looking at the edges rather than the vertices. The main application of this is for crossover in genetic algorithms when a genotype with non-repeating gene sequences is needed such as for the travelling salesman problem.
Contents |
ERO is based on an adjacency matrix, which lists the neighbors of each node in any parent.
For example, in a travelling salesman problem such as the one depicted, the node map for the parents CABDEF and ABCEFD (see illustration) is generated by taking the first parent, say, 'ABCEFD' and recording its immediate neighbors, including those that roll around the end of the string.
Therefore;
... -> [A] <-> [B] <-> [C] <-> [E] <-> [F] <-> [D] <- ...
...is converted into the following adjacency matrix by taking each node in turn, and listing its connected neighbors;
A: B D B: A C C: B E D: F A E: C F F: E D
With the same operation performed on the second parent (CABDEF), the following is produced:
A: C B B: A D C: F A D: B E E: D F F: E C
Followed by making a union of these two lists, and ignoring any duplicates. This is as simple as taking the elements of each list and appending them to generate a list of unique link end points. In our example, generating this;
A: B C D = {B,D} ∪ {C,B} B: A C D = {A,C} ∪ {A,D} C: A B E F = {B,E} ∪ {F,A} D: A B E F = {F,A} ∪ {B,E} E: C D F = {C,F} ∪ {D,F} F: C D E = {E,D} ∪ {E,C}
The result is another adjacency matrix, which stores the links for a network described by all the links in the parents. Note that more than two parents can be employed here to give more diverse links. However, this approach may result in sub-optimal paths.
Then, to create a path K, the following algorithm is employed:
Let K be the empty list Let N be the first node of a random parent. While Length(K) < Length(Parent): K := K, N (append N to K) Remove N from all neighbor lists If N's neighbor list is non-empty then let N* be the neighbor of N with the fewest neighbors in its list (or a random one, should there be multiple) else let N* be a randomly chosen node that is not in K N := N*
To step through the example, we randomly select a node from the parent starting points, {A, C}.
Note that the only edge introduced in ABDFCE is AE.
If one were to use an indirect representation for these parents (where each number in turn indexes and removes an element from an initially sorted set of nodes) and cross them with simple one-point crossover, one would get the following:
The parents: 31|1111 (CABDEF) 11|1211 (ABCEFD) The children: 11|1111 (ABCDEF) 31|1211 (ABEDFC)
Both children introduce the edges CD and FA.
The reason why frequent edge introduction is a bad thing in these kinds of problem is that very few of the edges tend to be usable and many of them severely inhibit an otherwise good solution. The optimal route in the examples is ABDFEC, but swapping A for F turns it from optimal to far below an average random guess.
The difference between ERO and the indirect one-point crossover can be seen in the diagram. It takes ERO 25 generations of 500 individuals to reach 80% of the optimal path in a 29 point data set, something the indirect representation spends 150 generations on. Partially mapped crossover (PMX) ranks between ERO and indirect one-point crossover, with 80 generations for this particular target.[1]
Whitley, Darrell; Timothy Starkweather, D'Ann Fuquay (1989). "Scheduling problems and traveling salesman: The genetic edge recombination operator". International Conference on Genetic Algorithms. pp. 133–140. ISBN 1-55860-066-3.